Model Selection

Financial Document Processing

# Financial Document Processing

A vision-language model specifically designed for Thai-English real-world document parsing, based on the Qwen2.5-VL-Instruction framework

Transformers Supports Multiple Languages

Qwen2 VL 2B OCR

Qwen2-VL-2B-OCR is an OCR model fine-tuned based on unsloth/Qwen2-VL-2B-Instruct, specializing in extracting complete text from documents, tables, and payroll images.

Transformers English

OCR TextInput Base

A specialized image-to-text model for the financial domain, supporting English text recognition, primarily used for processing image content in financial documents.

Text Recognition

Transformers English

Donut Base Finetuned Cord V2

Donut is a visual document understanding model based on Swin Transformer, specifically fine-tuned for the CORD dataset, capable of extracting structured text information from images.

Tatr Tab Struct V2

DETR architecture model trained on PubTables1M and FinTabNet datasets, specialized for table structure recognition tasks

Text Recognition

Donut Base Finetuned Invoices

Multilingual invoice processing model optimized based on Donut architecture, capable of extracting key invoice fields

Layout Xlm Base Finetuned With DocLayNet Base At Linelevel Ml384

A line-level document understanding model fine-tuned on the DocLayNet dataset based on the LayoutXLM base model, supporting multilingual document layout analysis and token classification.

Text Recognition

Transformers Supports Multiple Languages

Lilt Xlm Roberta Base Finetuned With DocLayNet Base At Paragraphlevel Ml512

This is a document understanding model specifically designed for analyzing document layout and content, performing token classification tasks at the paragraph level.

Text Recognition

Transformers Supports Multiple Languages

Lilt Xlm Roberta Base Finetuned With DocLayNet Base At Linelevel Ml384

A line-level document understanding model fine-tuned based on LiLT and DocLayNet dataset, supporting multilingual document layout analysis

Transformers Supports Multiple Languages

Receipt Paper Invoice Document

This is an image classification model generated based on HuggingPics, specifically designed to recognize and classify images into four categories: receipts, paper, invoices, and documents.

Image Classification

This is a Donut model fine-tuned on the CORD-v2 dataset, designed for image-to-text tasks, achieving an average accuracy of 0.901.

OCR LayoutLMv3 Invoice

An invoice recognition model fine-tuned based on LayoutLMv3-base, trained on the wild_receipt dataset, excelling in extracting structured information from invoices.

Sequence Labeling

Vit Receipts Classifier

A binary classification model based on ViT architecture for identifying whether an image is a receipt/invoice

Image Classification

Layoutlm Document Qa

This is a fine-tuned multimodal LayoutLM model for document question answering tasks, capable of understanding both text and layout information in documents to answer questions.

Transformers English

Layoutlmv3 Cord Ner

A document understanding model fine-tuned based on LayoutLMv3-base, specifically designed for named entity recognition tasks on the CORD dataset

Text Recognition

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase